Cherry Blossom Peak Bloom Prediction 2026

Team 5103 — University of Maryland

Datasets

5 Competition sites

Site Records Span
Kyoto 837 812–2025
Washington DC 105 1921–2025
Liestal 132 1894–2025
Vancouver 4 2022–2025
New York City 2 2019–2025

3 Auxiliary sources: Japan regional (6,573), MeteoSwiss (6,642), South Korea (994)

USA-NPN enrichment — 5 extra NYC records from Washington Square Park citizen-science observations (species 228, phenophase 501)

Total: 14,598 bloom records

Enhanced features (from master dataset + NOAA CDO)

Feature Type
Winter / Spring mean temp Meteorological
GDD Jan-Feb, GDD winter Thermal accumulation
Chill hours (winter, Nov-Dec) Dormancy requirement
ONI, NAO, PDO indices Macro-climate
Hopkins bioclimatic index Phenological
Photoperiod (Mar 20) Astronomical
Lat, Long, Alt (log) Geographic
Year (centered + quadratic) Temporal trend

Models & Methodology

Model A — Local Trend (per site)

  • Recency-weighted quadratic: bloom_doy ~ year + year²
  • Exponential decay weights: \(w_i = e^{(i-n)/6}\), half-life ≈ 6 yr
  • Captures site-specific momentum from long local histories

Model B — Pooled GAM (R pipeline)

\[\text{DOY} \sim s(\text{year}) + s(\text{lat}, \text{long}) + s(\text{alt}) + s(\text{obs}) + \text{climate} + \text{source}\]

  • REML on 14,598 records
  • NOAA CDO covariates: winter/spring temp, GDD proxy

Model B′ — Gradient Boosted Trees (Python pipeline)

  • GBR with Huber loss, 800 trees, lr = 0.015
  • Enhanced features: GDD, chill hours, ONI/NAO/PDO, Hopkins index
  • Stochastic boosting (subsample=0.8, max_features=√p)

Ensemble blending

  • Rolling-origin backtest (1900–2025, ~126 windows)
  • Site-specific optimal weights via grid search (w ∈ [0, 1] by 0.02)
  • Sites with deep history → heavier Model A; sparse sites → Model B

Prediction intervals

  • Split-conformal: 90th percentile of backtest |residuals| per site
  • ≥90% empirical coverage, tight enough for SSW tiebreaker

Cross-language guard

  • R (GAM) and Python (GBR) run independently
  • If avg gap ≤ 4 days → blend both; else default to R
  • Current gap: 2.8 days → blended

Backtest Results

Backtest MAE comparison

Model MAE (days) vs Baseline
Local trend only 7.27
Baseline GAM ensemble ~5.61
Enhanced global (GBR) 4.52 +19%
Enhanced ensemble 4.23 +25%

Key improvements:

  • GDD + chill hours capture phenological drivers directly
  • Climate indices (ONI, NAO, PDO) explain year-to-year variability
  • Stochastic boosting + deeper trees capture complex interactions
  • Site-specific weights outperform global inverse-MAE

Inferential Factors

Global-scale drivers

  1. Rising temperatures — 1.1 °C warming since pre-industrial. Warmer winters reduce chill-hour needs; warmer springs accelerate GDD → bloom advances everywhere.

  2. ENSO — La Niña cools Pacific NW (delays Vancouver), milder Eastern US winters (advances DC/NYC). 2025-26 neutral transition → near-normal timing.

  3. Latitude gradient — ~1 DOY delay per degree N above 35°N. GAM’s lat-long smooth captures this.

  4. Altitude penalty — ~2 days later per 100 m elevation. Liestal (350 m) systematically later than sea-level sites.

Site-specific drivers

  • Kyoto: Urban heat island + 0.2 d/century advancement. 1200-yr record stabilises local model.

  • Washington DC: Tidal Basin thermal buffer → earliest bloomer. 2.2 d/decade recent acceleration.

  • Liestal: Alpine amplified warming (0.3 °C/decade). Foehn winds cause stochastic early bloom → widest variability.

  • Vancouver: PDO phase modulates decadal variability. Maritime buffering keeps volatility moderate.

  • NYC: Shortest record (12 yrs) → highest uncertainty. Continental cold snaps reset GDD accumulation. NPN data fusion reduces LOO MAE by ~1.5 days.

2026 Predictions

City DOY Date Interval Width
Washington DC 84 Mar 25, 2026 Mar 17 – Apr 01 15
Liestal 86 Mar 27, 2026 Mar 19 – Apr 05 17
Kyoto 88 Mar 29, 2026 Mar 20 – Apr 07 18
Vancouver 90 Mar 31, 2026 Mar 21 – Apr 10 20
New York City 92 Apr 02, 2026 Mar 25 – Apr 09 15


Metric Value
Backtest MAE (enhanced) 4.23 days
R vs Python gap 2.8 days
Sum of squared widths (SSW) 1463
Cross-site spread 8 days
Blend method Averaged R + Python

Enhanced vs Baseline

City Baseline Enhanced Shift
Kyoto DOY 88 (18d) DOY 94 (12d) +6 d
Liestal DOY 86 (17d) DOY 95 (10d) +9 d
New York City DOY 92 (15d) DOY 98 (11d) +6 d
Vancouver DOY 90 (20d) DOY 95 (25d) +5 d
Washington DC DOY 84 (15d) DOY 89 (13d) +5 d


What the enhanced model adds:

  • GDD + chill hours → direct phenological drivers
  • ONI / NAO / PDO → macro-climate variability
  • Hopkins index + photoperiod → bioclimatic baselines
  • Deeper trees (depth 4) capture feature interactions
  • Stochastic boosting reduces overfitting

Why enhanced predictions are later:

  • Better calibration of spring warming accumulation
  • Chill-hour accounting prevents premature bloom calls
  • Climate indices correctly dampen warm-bias years

Tighter intervals (12.2 d avg vs 15.4 d baseline):

  • More features → lower residual variance
  • Better site-specific interval calibration
  • Exception: Vancouver (25 d) — only 4 years of data



Team 5103 — University of Maryland

Cherry Blossom Peak Bloom Prediction 2026


Code, data, and outputs: github.com/GMU-CherryBlossomCompetition

Interactive dashboard

quarto render solution.qmd                           # R pipeline
python Solution_Enhanced_v2.py                        # Enhanced Python
quarto render dashboard.qmd                           # Dashboard